Skip to content

fix: correct StrictMetricsEvaluator::CanContainNulls/CanContainNaNs when field missing from stats map#686

Merged
wgtmac merged 1 commit into
apache:mainfrom
sentomk:fix/strict-metrics-evaluator
May 28, 2026
Merged

fix: correct StrictMetricsEvaluator::CanContainNulls/CanContainNaNs when field missing from stats map#686
wgtmac merged 1 commit into
apache:mainfrom
sentomk:fix/strict-metrics-evaluator

Conversation

@sentomk
Copy link
Copy Markdown
Contributor

@sentomk sentomk commented May 27, 2026

Summary

  • Fix StrictMetricsEvaluator::CanContainNulls to return true (conservative) when a field is absent from a non-empty null_value_counts map, and false for required fields per schema
  • Fix StrictMetricsEvaluator::CanContainNaNs to return true (conservative) when a field is absent from a non-empty nan_value_counts map, and false for non-floating-point types
  • Add regression tests covering the missing-entry scenario

Close #685

Motivation

When null_value_counts or nan_value_counts is non-empty but does not contain an entry for the queried field, the evaluator incorrectly returned false. This caused comparison operators to erroneously return kRowsMustMatch, potentially skipping row-level filtering and returning rows that do not satisfy the predicate.

Test plan

  • All 40 existing + new StrictMetrics tests pass
  • pre-commit (clang-format, trailing whitespace, etc.) passes

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes StrictMetricsEvaluator’s handling of missing per-field entries in null_value_counts / nan_value_counts, making the evaluator conservative (i.e., assume nulls/NaNs may exist) when the stats maps are non-empty but omit the queried field, preventing incorrect kRowsMustMatch results that could skip row-level filtering.

Changes:

  • Update CanContainNulls to return true when a field is absent from a non-empty null_value_counts map, and false for required schema fields.
  • Update CanContainNaNs to return true when a field is absent from a non-empty nan_value_counts map, and false for non-floating-point fields.
  • Add regression tests for missing-entry scenarios (with one test needing adjustment to actually exercise the bug condition).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/iceberg/expression/strict_metrics_evaluator.cc Makes null/NaN containment checks conservative when stats maps omit a field, and adds required/non-floating fast paths.
src/iceberg/test/strict_metrics_evaluator_test.cc Adds regression coverage for missing null/NaN count entries (one test currently doesn’t set bounds, so it won’t regress the original issue).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iceberg/test/strict_metrics_evaluator_test.cc Outdated
@sentomk sentomk force-pushed the fix/strict-metrics-evaluator branch 2 times, most recently from e7344ea to 68c68b3 Compare May 27, 2026 17:12
Comment thread src/iceberg/expression/strict_metrics_evaluator.cc Outdated
Comment thread src/iceberg/expression/strict_metrics_evaluator.cc
Comment thread src/iceberg/test/strict_metrics_evaluator_test.cc Outdated
@wgtmac wgtmac added the awaiting author action Author needs to address comments, resolve conflicts, answer questions, etc. label May 28, 2026
@sentomk sentomk force-pushed the fix/strict-metrics-evaluator branch from 68c68b3 to ca30ee3 Compare May 28, 2026 10:52
…from stats map

Use Schema::FindFieldById (recursive lookup) instead of
StructType::GetFieldById (top-level only). Return Result<bool> to
propagate errors. Restore comment explaining null_value_counts
semantics. Add regression tests in a standalone test suite.
@sentomk sentomk force-pushed the fix/strict-metrics-evaluator branch from ca30ee3 to 3c8284b Compare May 28, 2026 10:59
Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @sentomk!

@wgtmac wgtmac changed the title fix(expr): correct CanContainNulls/CanContainNaNs when field missing from stats map fix: correct StrictMetricsEvaluator::CanContainNulls/CanContainNaNs when field missing from stats map May 28, 2026
@wgtmac wgtmac merged commit d6c1da0 into apache:main May 28, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting author action Author needs to address comments, resolve conflicts, answer questions, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: StrictMetricsEvaluator returns incorrect result when null/NaN counts are missing for a field.

3 participants